Predictive Modeling in the Insurance Industry Using SAS Software

نویسنده

  • Terry J. Woodfield
چکیده

Insurance companies, third party insurance administrators, state insurance funds, state regulatory agencies, insurance industry consultants, and application service providers (ASP) use SAS software in a variety of predictive modeling situations. SAS Enterprise Miner provides tools for modeling continuous responses, such as monetary losses and time off work, and discrete responses, such as fraud, third party subrogation, and litigation. Enterprise Miner includes a rich set of menu driven modeling solutions as well as an underlying array of SAS procedures for modeling insurance data. This paper will address two problems, predicting losses and detecting fraud. Two possible solutions will be presented for each problem, and additional solutions will be suggested. The paper is intended for actuaries, business analysts, and insurance managers who are interested in solving predictive modeling problems in the insurance industry. INTRODUCTION Insurance companies, third party insurance administrators, state insurance funds, state regulatory agencies, insurance industry consultants, and application service providers (ASP) use SAS software in a variety of predictive modeling situations. For property and casualty insurance, setting premiums requires an assessment of risk that leads to estimates of loss for a book of business. To improve case management and reduce bulk reserves, insurers have turned to predictive models for predicting individual claim losses. For workers’ compensation insurance, loss prediction includes predicting time off of work and total medical expenses. This paper will address the prediction of medical expenses at the claim level. The problem of predicting losses is exacerbated by the increase in fraud by claimants and by health care providers. The problem of claimant fraud detection will also be addressed in this paper. Speights, et al, (1999b) address the problem of profiling health care providers, which may indirectly contribute to solving the provider fraud problem. Statisticians translate business problems into a statistical or mathematical formulation, and then they apply appropriate statistical modeling techniques to help solve the original business problem. For the problem of predicting insurance losses, the equivalent statistical problem is predicting continuous responses (targets, outcomes) given a set of predictor variables (inputs, independent variables). Such problems can be solved using linear or nonlinear regression models, or computer intensive models like neural networks or regression trees. When censoring occurs, survival models are appropriate, and many conventional modeling approaches accommodate censoring, including neural network modeling (DeLaurentis and Ravdin 1994, Faraggi, et al, 1995, Biganzoli, et al, 1998, Speights, et al, 1999a) and decision tree modeling (Ciampi, Negassa, and Lou 1995). Fraud prediction is classified as predicting binary responses, which may be accomplished using logistic regression, decision trees, or neural networks. SAS/STAT software supports a variety of nonlinear regression models for analyses of censored data. One of the challenges of predictive modeling in insurance is obtaining data that can be used to build a predictive model. Woodfield (1994) addresses some of the pitfalls that occur when obtaining and preparing data for modeling. Pyle (1999) gives an excellent overview of data preparation for the general data mining problem. For this paper, hypothetical simulated data will be used to illustrate the techniques employed. CLAIMANT FRAUD When a worker is injured while performing the duties of his or her job, the nature of the workers’ compensation insurance system assumes liability of the employer while also placing limits of liability depending on the circumstances. The injured party whose accident is covered by an insurance policy is called the claimant. A claimant can commit fraud in a number of ways. The claimant can fake an accident. The claimant can exaggerate the severity of the injury. The claimant can misrepresent the injury to presume coverage, such as the worker who breaks a leg skiing on Sunday, then reports to work on Monday and within a few minutes claims to have fallen and broken the leg on the job. The claimant can conspire with a health care provider to misrepresent when health care was received, the magnitude of the injury, and the nature of the health care received. Property and Casualty insurance covers a wide spectrum of accidents, including coverage for auto liability bodily injury and workers’ compensation medical expenses. When building models to detect fraud, the specific coverage is important, and it is rarely possible to mix data from different insurance lines. Auto liability bodily injury usually involves a one-time payment of a settlement amount that is negotiated based on health care expenses incurred to-date and predicted future expenses. Workers’ compensation coverage usually provides for settlement of medical expenses with 45 days of when they are billed, and may include a one-time settlement that is negotiated based on anticipated future expenses. An auto liability settlement is usually a cash payment, whereas a workers’ compensation settlement may include an annuity. Auto liability policies usually stipulate maximum coverage, for example, $1 million per accident, $100,000 per person. Workers’ compensation usually has no limit on medical expenses, although limits are always placed on wage replacement with respect to weekly compensation. These facts will be especially relevant when you consider loss reserving models later. Statistical models for fraud address two situations. In the first, little or no data is available to target specific claims as having been fraudulent. This becomes an outlier detection problem. In the second situation, data is available in which each claim has one of three flags: (0) definitely no fraud, (1) definitely fraudulent, and (-1) fraud status is unknown. For the second situation, combining the (0) and (-1) flags is often performed in one of two ways: simply treating unknown as no fraud, or treating unknown as the first situation, identifying outliers, classifying some outliers as fraud, and proceeding with only targets (0) and (1) in the data. The problem of unknowns can also be addressed as a reject inference problem, with the usual strategy being to randomly select claims and investigate all of them, trying to predict when an unknown claim would have been determined to be fraudulent. In this paper, fraud detection will be addressed using a binary response model, but you should be aware that there are often more complex modeling issues that must be addressed. USING SAS ENTERPRISE MINER FOR FRAUD DETECTION To illustrate fraud detection using Enterprise Miner, a simulated fraud data set is employed. As with any data mining problem, at least 50% of the project will be devoted to data collection, data processing, and data repair. Some of the data repair tasks include imputing missing or invalid observations. The initial Advanced Tutorials

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Workers’ Compensation Insurance Fraud Using SAS Enterprise Miner 5.1 and SAS Text Miner

Insurance fraud costs the property and casualty insurance industry over 25 billion dollars (USD) annually. This paper addresses workers' compensation claim fraud. A data mining approach is adopted, and issues of data preparation are discussed. The focus is on building predictive models to score an open claim for a propensity to be fraudulent. A key component to modeling is the use of textual da...

متن کامل

Machine Learning With SAS Enterprise Miner How a Team of SAS® Modelers Created and Determined a Champion Model to Predict Churn Using KDD Cup Data

Jonathan Wexler is the Principal Product Manager responsible for SAS Enterprise Miner and SAS High-Performance Analytics. He has an undergraduate degree in mathematics/statistics and a master's degree in biostatistics, with more than 18 years of experience using SAS. Philip Easterling has 20 years of SAS analytical software sales and consulting experience, including predictive modeling in servi...

متن کامل

Healthcare Data Manipulation and Analytics Using SAS

Increasing application of information technology in the healthcare delivery system helps healthcare industry gain valuable knowledge from data, and use this insight to recommend action or guide decision making, and improve quality of patient care and practice of medicine. There is more data available than ever before. How can it truly benefit patients, payers and healthcare providers? Analytics...

متن کامل

Service Process Modeling through Simulation and Scenario Development for Insurance Analysis

Insurance companies are among the service organizations, which maintain close relationships with their clients by providing insurance services. Clients are the most important resource for service companies. And profitability of insurance companies undoubtedly hinges on clear analysis of client satisfaction and improved productivity of service providers. An important factor of client satisfactio...

متن کامل

Investigating Open Source Project Success: A Data Mining Approach to Model Formulation, Validation and Testing

This paper demonstrates the use of Data Mining (DM) techniques in exploratory research. A robust model for identifying the factors that explain the success of Open Source Software (OSS) projects is created, validated and tested. The predictive modeling techniques of Logistic Regression (LR), Decision Trees (DT) and Neural Networks (NN) are used together in this analysis. Using Text Mining resul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001